Skip to content

refactor(ltm): redesign long-term memory with append-only incremental contexts#8144

Closed
RC-CHN wants to merge 55 commits into
AstrBotDevs:masterfrom
RC-CHN:refactor-ltm
Closed

refactor(ltm): redesign long-term memory with append-only incremental contexts#8144
RC-CHN wants to merge 55 commits into
AstrBotDevs:masterfrom
RC-CHN:refactor-ltm

Conversation

@RC-CHN
Copy link
Copy Markdown
Member

@RC-CHN RC-CHN commented May 11, 2026

Motivation

Fixes #8080
Rewrite the long-term memory (LTM) module from a ring buffer to an append-only architecture that keeps context prefixes stable across requests — enabling KV cache hits and the associated cost discounts (typically 1/10 of standard pricing across OpenAI, Anthropic, DeepSeek, and cloud providers).

Modifications / 改动点

Core: astrbot/builtin_stars/astrbot/long_term_memory.py

  • Replace max_cnt ring buffer with raw_records (deque) + _raw_cursor + contexts (append-only list). Old segments are never rebuilt.
  • _build_segments() converts raw chat lines into OpenAI-format context segments, handling tool calls, parallel tools, and multi-step chains.
  • <BOT/> markers replace [You/] to avoid nickname collisions.
  • on_agent_done records tool-call chains and now includes the @bot prompt in contexts so future rounds see the user's original message.
  • asyncio.Lock for concurrency safety; remove_session() for cleanup.

Hook wiring: astrbot/builtin_stars/astrbot/main.py

  • Swap @on_llm_response@on_agent_done for accurate tool-chain recording.
  • Lazy toggle detection: false→true cleans stale state on next message.
  • group_icl_enable=true skips Conversation DB query (conversation=None).

Config: astrbot/builtin_stars/astrbot/default.py

  • Default context_limit_reached_strategy"llm_compress".

Agent runner: astrbot/core/astr_main_agent.py

  • _get_compress_provider auto-falls back to the main chat provider when llm_compress_provider_id is unset, preventing silent truncation.

Tests: tests/unit/test_long_term_memory.py (new, 47 tests)

  • Pure functions: extract, parse, truncate, build_segments (31 tests).

  • Integration: round-trip lifecycle, multi-round accumulation, tool chains, persona preservation, concurrent safety (16 tests).

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Tested on personal self-hosted astrbot.
image


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Refactor the long-term memory subsystem to use an append-only, incremental context architecture and integrate it with agent completion hooks, while improving default compression behavior and regression coverage.

Enhancements:

  • Redesign group chat long-term memory to store raw messages and derived contexts in an append-only structure with concurrency-safe trimming and incremental segment building for LLM requests.
  • Update main agent wiring to build LLM contexts at request time, record full agent tool-call chains after completion, and lazily reset long-term memory state when toggling group ICL.
  • Change the default context limit reached strategy to use LLM-based compression instead of truncating by turns.
  • Allow the compression pipeline to fall back to the primary chat provider when no dedicated compression provider is configured or available.

Tests:

  • Add an extensive test suite for the new long-term memory implementation, covering parsing helpers, segment construction, multi-round accumulation, tool-chain recording, extreme inputs, persona interaction, and concurrency behavior.

Summary by Sourcery

Refactor group long-term memory to an append-only, incrementally built context model integrated with agent completion hooks, while tightening context compression behavior and isolating request-time context guarding from persistent history management.

Enhancements:

  • Redesign long-term memory to store raw group messages and derived LLM contexts in an append-only structure with configurable truncation and optional LLM-based summarization, including tool-call chains and bot replies.
  • Introduce a request-scoped context guard in the agent runner so per-request truncation/compression no longer mutates persistent conversation history, and adjust provider payload construction accordingly.
  • Adjust group ICL wiring to build contexts at request time, record agent results via the agent-done hook, and lazily reset LTM state when toggling group memory on or off.
  • Expand provider LTM and main agent configuration to support richer compaction controls, history tool-result truncation, raw record size limits, and improved default context compression strategy (LLM-based by default).

Tests:

  • Add an extensive unit test suite for the new long-term memory implementation covering parsing helpers, segment construction, multi-round accumulation, tool-chain recording, extreme inputs, persona interaction, and concurrency behavior.

@auto-assign auto-assign Bot requested review from advent259141 and anka-afk May 11, 2026 01:38
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 11, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The MAX_* limits (MAX_MSGS_PER_USER_SEGMENT, MAX_CHARS_PER_USER_SEGMENT, MAX_RAW_BYTES) are currently hard-coded; consider wiring these through configuration (e.g., provider_ltm_settings) so different deployments or groups can tune memory usage and retention behavior without code changes.
  • In _trim_raw_records, total is recomputed by summing len(s.encode()) on every call, which is O(n); if this runs frequently on busy groups, consider tracking a running byte-size counter per umo to avoid repeatedly traversing the deque.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The MAX_* limits (MAX_MSGS_PER_USER_SEGMENT, MAX_CHARS_PER_USER_SEGMENT, MAX_RAW_BYTES) are currently hard-coded; consider wiring these through configuration (e.g., provider_ltm_settings) so different deployments or groups can tune memory usage and retention behavior without code changes.
- In _trim_raw_records, total is recomputed by summing len(s.encode()) on every call, which is O(n); if this runs frequently on busy groups, consider tracking a running byte-size counter per umo to avoid repeatedly traversing the deque.

## Individual Comments

### Comment 1
<location path="astrbot/builtin_stars/astrbot/long_term_memory.py" line_range="290-299" />
<code_context>
+    # 裁剪
+    # =========================================================================
+
+    def _trim_raw_records(self, umo: str) -> None:
+        """仅淘汰 cursor 之前的条目。cursor 之后的绝不碰(issue #2)。"""
+        dq = self.raw_records[umo]
+        cursor = self._raw_cursor[umo]
+
+        # 1. 无条件清除 cursor 之前的条目(已消费)
+        while dq and cursor > 0:
+            dq.popleft()
+            cursor -= 1
+        self._raw_cursor[umo] = cursor
+
+        # 2. 按大小继续从前面淘汰(限制极端情况的总内存)
+        total = sum(len(s.encode()) for s in dq)
+        while total > MAX_RAW_BYTES and dq and cursor > 0:
+            removed = dq.popleft()
+            total -= len(removed.encode())
</code_context>
<issue_to_address>
**issue (bug_risk):** Size-based trimming branch is effectively dead due to cursor reset logic.

In `_trim_raw_records`, the first loop always decrements `cursor` to 0 and then writes it back to `self._raw_cursor[umo]`. As a result, in the size-based loop `while total > MAX_RAW_BYTES and dq and cursor > 0:`, `cursor` is always 0 and the loop never runs, so `MAX_RAW_BYTES` is never enforced.

To preserve the intended behavior (always drop fully-consumed entries, and then optionally drop additional consumed entries to satisfy `MAX_RAW_BYTES`), you’ll need to decouple the notion of “consumed index” from the deque length. For example, track how many entries are removed in the first loop and use that to derive which entries are safe to drop in the size-based phase, rather than relying on `cursor > 0` after the first loop.
</issue_to_address>

### Comment 2
<location path="tests/unit/test_long_term_memory.py" line_range="207-216" />
<code_context>
+    def test_tool_call_then_result_then_bot(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for `_build_segments` when a tool result appears without a preceding tool call.

Current `_build_segments` tests only cover well-formed tool flows (`<T:CALL>``<T:RES>``<BOT>`). Please add a case where a `<T:RES>` appears without a prior `<T:CALL>`, e.g.:

```python
def test_tool_result_without_call_then_bot(self):
    lines = [
        "<T:RES id=orphan>data</T:RES>",
        "<BOT/14:30>: ok",
    ]
    result = _build_segments(lines)
    # assert behavior: either a valid tool segment or clean ignore, no exception,
    # and an intact assistant segment.
```

This helps ensure `_build_segments` behaves predictably with partial or inconsistent histories.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread tests/unit/test_long_term_memory.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Long Term Memory (LTM) v2, which significantly improves chatroom memory management by implementing incremental context building, support for tool-call history, and memory-efficient message tracking using deques and cursors. The changes also include a fallback mechanism for LLM compression and extensive unit tests. Feedback focuses on several critical areas: a potential memory leak in the contexts dictionary which is currently append-only, a logic error in the size-based trimming of raw records that renders some code unreachable, and the risk of KeyError crashes when parsing malformed tool-call records. Additionally, there is a discrepancy between the system prompt's description of bot message markers and the actual role-based formatting sent to the LLM.

Comment on lines +50 to +51
self.contexts: dict[str, list[dict]] = defaultdict(list)
"""累积累积态 LLM 上下文。由 ContextManager 修改后保留。"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The self.contexts dictionary is append-only and never pruned. In long-running sessions or active group chats, this will lead to a memory leak as the list of segments grows indefinitely. While append-only contexts help with KV cache hits, you should still implement a maximum context length (e.g., based on the provider's window or a safe segment count) to prevent unbounded memory growth.

Comment on lines +156 to +161
async with self._lock:
umo = event.unified_msg_origin

# 记录写入前索引 → on_req_llm 精确排除(issue #1, #9)
raw_idx = len(self.raw_records[umo])
event.set_extra("_ltm_raw_idx", raw_idx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

handle_message appends to raw_records but never triggers trimming. In groups that rarely interact with the bot, raw_records will grow indefinitely because _trim_raw_records is only called during an agent run. Trimming should be performed here (before calculating raw_idx) to ensure memory usage remains bounded. Note that since this logic is synchronous and does not contain 'await' calls, it is executed atomically in the asyncio event loop and does not require an explicit lock.

        umo = event.unified_msg_origin
        self._trim_raw_records(umo)

        # 记录写入前索引 → on_req_llm 精确排除(issue #1, #9)
        raw_idx = len(self.raw_records[umo])
        event.set_extra("_ltm_raw_idx", raw_idx)
References
  1. In a single-threaded asyncio event loop, synchronous functions (code blocks without 'await') are executed atomically and will not be interrupted by other coroutines. Therefore, they are safe from race conditions when modifying shared state within that block.

Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
@RC-CHN RC-CHN marked this pull request as draft May 11, 2026 01:58
M1LKT and others added 18 commits May 18, 2026 14:20
…8153)

* chore: streamline convert_audio_to_opus logic

- Route Opus conversion directly through the underlying convert_audio_format.
- Remove redundant FFmpeg processing chains to improve code reusability.

* perf: optimize AMR voice encoding parameters

- Enhance AMR audio quality via built-in FFmpeg filters.
AstrBotDevs#8136)

* fix: handle None tool arguments returned by Claude API for no-parameter tools

* fix: handle None tool arguments from Claude API for no-parameter tools

* fix: generalize None tool args comment

* fix: generalize None tool args comment

* 去除空格,以保证格式正确
* fix: add ollama and nvidia embedding

* fix: address code review feedback for embedding providers

 - Remove redundant proxy branch in NvidiaEmbeddingProvider._get_client

 - Change ClientError handling to re-raise instead of wrapping in Exception

 - Add exc_info=True for better error diagnostics

 - Remove redundant isinstance check in OllamaEmbeddingProvider._build_payload
* fix: surface weixin media send failures

* fix: include weixin send failure context

* Delete tests/unit/test_weixin_oc_adapter.py

---------

Co-authored-by: Weilong Liao <37870767+Soulter@users.noreply.github.com>
)

* feat(lark): implement app registration and bot info retrieval

- Add app registration functionality for Lark and Feishu platforms, including endpoints and request handling.
- Introduce polling mechanism for app registration status.
- Create bot info retrieval functionality to fetch bot details after successful registration.
- Enhance dashboard with new UI components for one-click QR setup and manual setup options.
- Update internationalization files to support new features and actions.
- Add unit tests for app registration endpoint resolution and data handling.

* feat(weixin_oc): add WeChat login registration and QR code handling
…avoid crashes on invalid or empty values

* fix: add comments and await asyncio.sleep(0) for startup signal

* fix: [Bug] 修复 MiniMax TTS 空字符串配置解析报错

* fix: 采纳AI审查建议,添日志+提取默认配置变量

* fix: 移除误加的core_lifecycle.py改动

---------

Co-authored-by: RainBot-Ai <qianlanzhiya@gmail.com>
…#8015)

The WebUI only loaded Noto Sans SC (Simplified Chinese), which lacks
Cyrillic glyphs. Russian text fell back to system sans-serif, causing
poor rendering depending on the OS.

Changes:
- Load Noto Sans (regular) from Google Fonts alongside Noto Sans SC
- Add 'Noto Sans' at the END of $cjk-sans-fallback (after CJK fonts)
  so Chinese text still renders with system CJK fonts first,
  while Cyrillic text falls through to Noto Sans.

This ensures both Chinese and Cyrillic text render correctly.
…ffmpeg failure (AstrBotDevs#8009)

* fix: detect Tencent SILK (\x02 prefix) in audio magic bytes to avoid ffmpeg failure

QQ official bot sends voice in Tencent SILK format (leading \x02 byte before
#!SILK_V3 magic). _get_audio_magic_type() had two off-by-one slice errors:

  1. Standard SILK:  header[:8]  vs b'#!SILK_V3' (8 != 9 bytes) — never matched
  2. Tencent SILK:   not detected at all

Fixes:
  - Standard SILK:  header[:9]  == b'#!SILK_V3'   (correct 9-byte slice)
  - Tencent SILK:   header[:1] == b"\x02" and header[1:10] == b'#!SILK_V3'
  - ensure_wav() routes detected silk to tencent_silk_to_wav()

Before: QQ voice → ffmpeg → 'Invalid data found'
After:  QQ voice → magic detects silk → tencent_silk_to_wav → WAV OK

* refactor: use startswith() for SILK magic byte detection

Replace manual slice comparisons with startswith() — cleaner, less
error-prone, and immune to off-by-one slice errors.

Suggested by: sourcery-ai
* fix(core): pass images through active replies

* fix: harden active reply image collection

* test: avoid logger coupling in active reply test

* Delete tests/unit/test_builtin_astrbot_main.py

---------

Co-authored-by: Weilong Liao <37870767+Soulter@users.noreply.github.com>
…xt (AstrBotDevs#8205)

PR AstrBotDevs#8015 added 'Noto Sans' to the Google Fonts link and CJK fallback list,
but the font was placed at the end of $cjk-sans-fallback where browsers
never reach it for Cyrillic text. The global $body-font-family also lacked
'Outfit' entirely, causing Vuetify to use CJK fonts as the primary face.

Changes:
- Remove 'Noto Sans' from the end of $cjk-sans-fallback (it is not a CJK font)
- Add 'Outfit' and 'Noto Sans' to $body-font-family before CJK fallbacks
- Update .Outfit class in _container.scss to match the new stack

This ensures:
- Latin text → Outfit
- Cyrillic text → Noto Sans (loaded by vite-plugin-webfont-dl)
- CJK text → Noto Sans SC / PingFang SC etc.

Fixes follow-up to AstrBotDevs#8015.
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels May 18, 2026
RC-CHN and others added 6 commits May 18, 2026 14:43
)

* feat(lark): implement app registration and bot info retrieval

- Add app registration functionality for Lark and Feishu platforms, including endpoints and request handling.
- Introduce polling mechanism for app registration status.
- Create bot info retrieval functionality to fetch bot details after successful registration.
- Enhance dashboard with new UI components for one-click QR setup and manual setup options.
- Update internationalization files to support new features and actions.
- Add unit tests for app registration endpoint resolution and data handling.

* feat(weixin_oc): add WeChat login registration and QR code handling
…#8015)

The WebUI only loaded Noto Sans SC (Simplified Chinese), which lacks
Cyrillic glyphs. Russian text fell back to system sans-serif, causing
poor rendering depending on the OS.

Changes:
- Load Noto Sans (regular) from Google Fonts alongside Noto Sans SC
- Add 'Noto Sans' at the END of $cjk-sans-fallback (after CJK fonts)
  so Chinese text still renders with system CJK fonts first,
  while Cyrillic text falls through to Noto Sans.

This ensures both Chinese and Cyrillic text render correctly.
…xt (AstrBotDevs#8205)

PR AstrBotDevs#8015 added 'Noto Sans' to the Google Fonts link and CJK fallback list,
but the font was placed at the end of $cjk-sans-fallback where browsers
never reach it for Cyrillic text. The global $body-font-family also lacked
'Outfit' entirely, causing Vuetify to use CJK fonts as the primary face.

Changes:
- Remove 'Noto Sans' from the end of $cjk-sans-fallback (it is not a CJK font)
- Add 'Outfit' and 'Noto Sans' to $body-font-family before CJK fallbacks
- Update .Outfit class in _container.scss to match the new stack

This ensures:
- Latin text → Outfit
- Cyrillic text → Noto Sans (loaded by vite-plugin-webfont-dl)
- CJK text → Noto Sans SC / PingFang SC etc.

Fixes follow-up to AstrBotDevs#8015.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] 关于Astrbot设置里面的那些让大模型API资费猛增的"毒点"

10 participants